Cohesiveness Relationships to Empower Keyword Search on Tree Data on the Web

نویسندگان

  • Aggeliki Dimitriou
  • Ananya Dass
  • Dimitri Theodoratos
چکیده

Keyword search has been for several years the most popular technique for retrieving information over semistructured data on the web. The reason of this unprecedented success is well known and twofold: (1) the user does not need to master a complex query language to specify her requests for data, and (2) she does not need to have any knowledge of the structure of the data sources. However, these advantages come with two drawbacks: (1) as a result of the imprecision of keyword queries, there is usually a huge number of candidate results of which only very few match the user’ s intent. Unfortunately, the existing semantics are ad-hoc and they generally fail to“guess”the user intent. (2) As the number of keywords and the size of data grows the existing approaches do not scale satisfactorily. In this paper, we focus on keyword search on tree data and we introduce keyword queries which can express cohesiveness relationships. Intuitively, a cohesiveness relationship on keywords indicates that the instances of these keywords in a query result should form a cohesive whole, where instances of the other keywords do not interpolate. Cohesive keyword queries allow also keyword repetition and cohesiveness relationship nesting. Most importantly, despite their increased expressiveness, they enjoy both advantages of plain keyword search. We provide formal semantics for cohesive keyword queries on tree data which ranks query results on the proximity of the keyword instances. We design a stack based algorithm which builds a lattice of keyword partitions to efficiently compute keyword queries and further leverages cohesiveness relationships to significantly reduce the dimensionality of the lattice. We implemented our approach and ran extensive experiments to measure the effectiveness of keyword queries and the efficiency and scalability of our algorithm. Our results demonstrate that our approach outperforms previous filtering semantics and our algorithm scales smoothly achieving interactive response times on queries of 20 frequent keywords on large datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Cohesive Keyword Search on Tree Data

Keyword search is the most popular querying technique on semistructured data. Keyword queries are simple and convenient. However, as a consequence of their imprecision, there is usually a huge number of candidate results of which only very few match the user’s intent. Unfortunately, the existing semantics for keyword queries are ad-hoc and they generally fail to “guess” the user intent. Therefo...

متن کامل

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

A Structure-Based Search Engine for Phylogenetic Databases

Phylogenetic trees are essential for understanding the relationships among organisms or taxa. Many of the current techniques for searching phylogenetic repositories allow the user to perform a keyword-type search or an aligned sequence data search, or to browse a hierarchical list of taxa. Here we describe a new search engine that allows the user to present an example phylogeny, or a query tree...

متن کامل

Fuzzy retrieval of encrypted data by multi-purpose data-structures

The growing amount of information that has arisen from emerging technologies has caused organizations to face challenges in maintaining and managing their information. Expanding hardware, human resources, outsourcing data management, and maintenance an external organization in the form of cloud storage services, are two common approaches to overcome these challenges; The first approach costs of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1508.04957  شماره 

صفحات  -

تاریخ انتشار 2015